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ABSTRACT 

In  general,  network  traffic  data  has  a  heavy-tailed 
probability  distribution.  The  Entropy-Based  Heavy  Tailed 
Distribution  Transformation  (EHTDT)  has  been  developed  to 
convert  the  heavy  tailed  network  traffic  data  distribution  into  a 
transformed  probability  distribution.  In  practice,  the  entropy 
distribution  of  the  transformed  probability  distribution  exhibits 
a  type  of  linearity  that  gives  rise  to  an  eigenstructure  that 
allows  the  characterization  of  network  traffic  data  to  effectively 
lossily  compress  network  traffic  data  via  the  Rate  Controlled 
Eigen-Based  Coding.  The  aforementioned  eigenstructure  is 
motivated  by  singular  value  decomposition  theory.  A  very  high 
compression  ratio  can  be  achieved  by  the  proposed  method. 
Results  of  applying  the  methods  to  real  network  traffic  data 
network  traffic  data  are  presented. 

INTRODUCTION 

Intrusion  Detection  Systems  (IDS’s)  must  be  capable  of 
detecting  unknown  attacks.  The  problem  with  building  an 
anomaly  detection  model  is  that  observed  activities  deviate 
significantly  from  established  normal  usage  profiles.  Reliable 
anomaly  detection  modeling  requires  training  huge  datasets 
regularly  in  order  to  learn  legitimate  behaviors.  There  is  an 
enormous  cost  in  collecting,  storing,  and  analyzing  intrusion 
datasets.  A  difficult  problem  in  handling  intrusion  detection 
data  is  that  one  is  not  able  to  store  and  manage  efficiently  a 
huge  amount  of  intrusion  detection  data  with  the  current  data 
mining  and  data  management  technologies. 

In  general,  anomaly  detection  usually  involves 
computation  on  massive  datasets.  There  has  been  an  increased 
interest  in  data  mining  based  approaches  for  intrusion  detection. 


The  major  difficulty  of  data  mining  is  that  it  is  computationally 
expensive  to  find  correlations  between  attributes  in  massive 
intrusion  detection  datasets.  It  is  desirable  to  perform  statistical 
processing  on  reduced  datasets  instead  of  the  original  full 
datasets.  The  reduced  data  sets  must  of  course  contain  enough 
information  for  effective  segmentation  and  classification.  To 
efficiently  measure  similarity  in  appearance  within  object 
classes,  one  must  first  determine  which  features  are  most 
effective  at  describing  anomalies  of  objects.  A  standard  linear 
method  for  data  feature  extraction  is  that  of  principal 
component  analysis  (PCA).  This  reduction  is  achieved  by 
selecting  the  first  few  principal  components.  These  components 
capture  the  most  relevant  features  use  to  classify  a  group  of 
objects  to  be  recognized.  However,  intrusion  detection 
technologies  based  on  PCA  are  still  immature  because  of 
dynamic  behaviors  and  heavy  tailed  distributions  in  network 
traffic. 

The  study  of  heavy  tailed  distributions  in  network  traffic 
has  been  an  important  research  topic  in  various  network 
applications  [  1  ]  [2]  [4]  [5]  [6]  [7]  [8]  [  10] .  Characterization  of 
heavy  tailed  network  traffic  plays  a  critical  role  to  improve  the 
Quality  of  Services.  More  efficient  intrusion  detection  data 
modeling  and  management  methods  are  required  to  characterize 
heavy  tailed  network  traffic  data  with  greater  reliability  and 
faster  retrieval  rates. 

This  paper  provides  combined  network  traffic 
characterization  and  the  PCA  approaches  that  are  applied  to 
minimize  model  complexities  and  maintenance  problems  in 
IDS  design.  The  proposed  Entropy-Based  Heavy  Tailed 
Distribution  Transformation  and  the  Rate  Controlled  Eigen- 
Based  Coding  method  are  effective  methods  to  extract 
meaningful  features  from  heavy  tailed  datasets.  These  feature 
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extraction  functions  are  useful  for  traffic  analyzers  and 
intrusion  detection  tools. 

STATISTICAL  ANOMALY  DETECTION  MODELING 

There  has  been  recently  a  big  increase  in  the  number  of 
studies  related  to  the  statistical  analysis  to  characterize  traffic 
traces.  One  of  the  open  problems  in  understanding  the  dynamic 
nature  of  network  traffic  is  when  the  statistical  distributions  of 
traffic  traces  are  non-Gaussian  and  heavy  tailed  [3] [9],  Heavy 
tails  refer  to  the  power  decrease  of  the  marginal  distributions.  It 
is  evident  that  many  important  problems  with  heavy  tailed 
anomalies  are  poorly  described  by  standard  statistical  models. 

This  research  aims  to  develop  a  new  statistical  model  to 
represent  a  heavy  tailed  distribution  in  a  compact  form  with 
great  generality  and  several  feature  extraction  properties.  A 
wide  range  of  shapes  of  the  distribution  can  be  investigated  by 
choosing  the  parameters.  This  approach  is  novel  because  these 
estimators  are  extracted  to  take  advantage  of  the  anomaly 
detection. 

Future  research  will  focus  on  temporal  granularity  and 
statistical  characteristics,  how  to  detect  and  measure  these 
quantities  and  identify  other  potential  characteristics,  especially 
within  apparent  heavy  tailed  regions.  In  this  approach,  principal 
component  based  statistical  characteristics  are  extracted  from 
the  heavy  tailed  distribution  data,  and  stored  in  a  database  that 
is  updated  regularly  and  automatically  to  determine  dynamic 
thresholds  for  discriminant  functions. 

ENTROPY-BASED  HEAVY  TAILED  DISTRIBUTION 
TRANSFORM  (EHTDT) 

Network  traffic  characterization  has  been  studied 
extensively,  but  an  accurate  characterization  of  network  traffic 
still  remains  elusive  due  to  difficulty  of  parameter  estimations. 
This  section  describes  the  transformation  procedures  to 
characterize  network  traffic  data.  The  simple  transformation 
process  has  the  ability  to  predict  the  behavior  of  large-scale 
network  traffic.  This  section  describes  the  transformation 
procedures  with  real  network  traffic  data. 


bin  size=10, 10  seconds 


t  (hour) 


Figure  1.  PLOTS  OF  DAILY  CONNECTIONS. 


The  plot  of  real  network  traffic  connection  is  shown  in 
Figure  1.  Frequency  and  ordering  properties  of  network  traffic 
datasets  are  important  features  of  anomaly  detection  models. 
The  most  common  way  to  detect  anomalies  is  to  use  statistical 
distributions  represented  by  a  discrete  distribution  with  a 
specified  number  of  bins  and  the  relative  frequency  of  a  value 
appearing  in  that  bin.  Real  network  traffic  exhibits  heavy  tailed 
distributions  in  Figure  2. 

One  of  the  most  challenging  characteristics  of  heavy  tailed 
distribution  is  to  parameter  estimation.  Known  statistical 
procedures  can  be  used  to  estimate  parameters,  but  it  is 
infeasible  due  to  computational  complexity  for  real-time 
network  traffic  characterization.  This  research  addresses  a  new 
method  for  estimating  the  parameters  of  heavy  tailed 
distributions  using  the  EHTDT. 


o  5  10  15  20 


Figure  2.  PLOTS  OF  DAILY  CONNECTIONS,  THE  FIRST 
ORDERER  DIFFERENCE  ALONG  CONNECTIONS  AND  HISTOGRAM 
OF  HEAVY  TAILS. 

Note:  diff(connections(t))=connections(t+l)  -  connections(t)). 
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Figure  3.  PLOTS  OF  PROBABILITY  MASS  FUNCTION  (P), 
DIFFERENTIAL  OF  ENTROPY-BASED  HEAVY  TAILED 
DISTRIBUTION  ©  AND  ENTROPY-BASED  HEAVY  TAILED 
DISTRIBUTION  (39- 


Figure  4.  PLOTS  OF  TRANSFORMED  PROBABILITY  MASS 
FUNCTION  (P),  DIFFERENTIAL  OF  ENTROPY-BASED  HEAVY  TAILED 
DISTRIBUTION  ©  AND  ENTROPY-BASED  HEAVY  TAILED 
DISTRIBUTION  (¥). 


In  general,  network  traffic  data  have  heavy-tailed 
distributions.  Power-law  distributions  are  widely  used  for 
estimating  packet  interval  time  as  well  as  in  other  networking 
applications  and  s  -  contaminated  (Gaussian-mixture) 
distributions  are  useful  to  detect  anomaly  network  traffics.  A 
power-law  distribution  is  one-tailed  and  an  s  -  contaminated 
(Gaussian-mixture)  distribution  is  two-tailed.  For  statistical 
anomaly  detection,  two-tailed  distributions  have  been 
considered  to  derive  the  EHTDT. 


Forward  EHTDT 

Figure  3  presents  a  probability  mass  function,  a  differential 
of  entropy-based  heavy  tailed  distribution,  and  entropy-based 
heavy  tailed  distribution.  Experimental  results  indicate  that 
these  heavy  tailed  distributions  are  difficult  to  use  to 
characterize  network  traffic.  Hence,  the  Forward  EHTDT  has 
been  developed  for  fast  network  traffic  characterization. 

A  two-tailed  probability  mass  function  is  defined  as  a 
probability  vector 

P  =  (P(l),P(2),...,P(K-l),P(K),P(K  +  l),...,P(N))  (1) 


where  P(K)  is  the  unique  maximum  element  of  P  and  N  is  the 
maximum  index  number. 

To  characterize  network  traffic  in  a  more  compact  form,  the 
probability  vector  P  is  converted  into  a  transformed  probability 
vector  P  by  the  following  two  procedures: 

The  vector  f:l  is  defined,  such  that 


P(x)  = 


1  -  P(x) 

A 


(2) 


where 


A  =  fj(\-P(x)) 

x=l 


x  =  l,2 . 


is  the  normalization  factor,  and 


The  transformed  probability  vector  P  is  then  defined  by 
P  =  (P(K), p(K - 1), ... , /?(!), p{N\ P(N -\),...,P(K  +  \))  (3) 


where  P(  1)  =  P(K)  is  the  minimum  element  of  P 

The  Entropy-Based  Heavy  Tailed  Distribution  *P  is 
defined  by 

T(x)  =  ^(j)  (4) 

j= i 

where  g(j)  =  -P(j)\og2  P(j)  and  x  =  1,2,...,  N 

Note  that  the  last  element  of  *P  is  the  entropy  of  the 
transformed  probability  vector  P  . 
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W)  =  £#(*)  =  -t^) l0S2  A*)  =  H(P)  .  (5) 

X-l  1=1 

The  main  reason  to  transform  data  as  part  of  a  regression 
analysis  is  to  achieve  linearity.  In  practice,  the  proposed 
transformation  provides  approximate  linearity  as  shown  in 
Figure  4. 


Data  reduction  and  feature  selection  are  two  reasons  to 
compress  the  nonlinear  function  i) .  Fourier  coefficients, 
wavelet  coefficients,  and  principal  components  are  commonly 
selected  for  features.  In  this  approach,  the  principal  component 
analysis  (PCA)  will  be  applied  to  select  features  and  analyze 
anomaly  detection  data  sets.  The  estimated  nonlinear  function 
fj  can  be  determined  with  a  few  principal  components. 


Inverse  EHTDT 

The  Inverse  Entropy-Based  Heavy  Tailed  Distribution 
Transform  can  be  determined  by  the  following  procedure.  The 
first  order  differences  of  *F  are  used  to  determine  4  as  follows: 

£(x)  =  A('F)  =  T(x) -'¥(x-\)  =  -P(x) log,  P(x)  (6) 

where  g(\)  is  a  stored  parameter  and  x  =  2,3 . 

It  is  emphasized  that  one  can  deal  directly  with  P  (or  an 
estimate  of  P  via  an  iteration  technique),  and  then  obtain  the 

initial  heavy  tailed  probability  vector  P  from/3  .  Inverting  (3), 
the  probability  vector  /(  is  obtained  as 

P  =  ( P(K ),  P(K  - 1), . .. ,  PCI),  P(N),  ...,P(K  + 1))  (7) 

Then,  P  can  be  calculated  via 

P(x)  =  \-ip(x)  (8) 

where  the  normalization  factor  X  is  a  stored  parameter  and 
x  =  1,2,..., /V  . 

Very  Low-Bit  Rate  EHTDT 


The  reconstructed  function  'P  can  be  expressed  as 

'F  =  ®  +  7  (13) 

The  estimated  heavy  tailed  probability  vector  P  can  also 
be  determined  by  the  Inverse  Entropy-Based  Heavy  Tailed 
Distribution  Transform.  A  very  high  compression  ratio  can  be 
achieved  by  the  proposed  methods.  A  few  principal 
components,  'V(N)  and  g(\ )  are  selected  for  features. 

LOSSY  COMPRESSION 

Principal  component  analysis  (PCA)  is  a  popular  technique 
in  many  areas  of  multivariate  analysis.  There  are  various 
generalizations  of  PCA  such  as  multiple  correspondence 
analysis  (MCA),  non-metric  principal  component  analysis 
(NCA)  and  ordinary  metric  PCA.  In  correspondence  analysis, 
the  variables  are  linearly  transformed  to  provide  orthogonal 
solutions.  First,  we  will  briefly  describe  ordinary  metric  PCA 
and  Singular  Value  Decomposition  (SVD)-Based  Coding.  Then, 
the  Rate  Controlled  Eigen-Base  Coding  for  EHTDT  is 
introduced.  The  proposed  coding  system  will  reduce  the 
dimensionality  of  the  data  enormously  and  capture  the  effective 
feature  structure. 


The  Entropy-Based  Heavy  Tailed  Distribution  T  can  be 
decomposed  as 

VF  =  <D  -T  77 .  (9) 

The  sum  of  a  linear  function  <P  and  a  nonlinear  function  // 
where  ®  is  taken  as 


<F(x)  =  #(!)  + 


TTAQ-TQ) 
N- 1 


(x-l) 


(10) 


Principal  Component  Analysis  (PCA) 

Suppose  that  are  Ax  1  observation  vectors. 

Let  //  be  the  mean  vector  of  the  observation  vectors 
/j,  fM  .  Zero  mean  observation  vectors  are  given  by 

</>x=fx~P  (14) 

where  x  =  l,2 . 


for*  =  1,2 . 


The  empirical  covariance  matrix  S  is  computed  as 


<P  will  contain  most  of  the  energy  of  the  heavy  tailed 
information;  the  linear  distribution  of  the  heavy  tailed 
approximation  can  be  estimated  with  the  entropy  of  the  inverse 
distribution  'Y(N)  and  g(l )  .  The  nonlinear  function  77  is  then 


77  =  vF-d> . 

(11) 

The  differential  vector  is  given  by 

£(x)  =  T](x)-rj(x-\) 

(12) 

where  x  =  2,3 . 

S 


1  M 

M 


(15) 


The  unique  set  of  M  orthonormal  eigenvectors  of  S , 
Qm  =\ql,q2,...,qM] ,  and  their  associated  eigenvalues, 
A are  computed.  Linear  combinations  of  the  first  L 
eigenvectors  QL  =[qi,qn,...,qL ]  corresponding  to  the  L  largest 
eigenvalues  (e.g.,  \  >Z,  > ...  >  2,  )  span  the  space  of  the  zero 
mean  observation  vector  to  capture  most  of  the  relevant 
information  in  the  input  data.  The  projection  of  the  vector  <f>i 
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onto  the  lines  spanned  by  the  orthonormal  basis 
Ql  =  [ql,q2,...,ql  \  is  given  by  the  following  operation 

Px  =  QTL(/>x  =  Plx’-’  Puf  (16) 

where  1  <  x  <  N  . 

The  elements  of  vector  px  are  called  the  principal  components. 
The  N  x  1  principal  component  vector  px  contains  compact 

information  for  f\ .  The  reconstructed  vector  fx  can  be 
computed  as 

1=QlPx+P  (17) 

where  1  <  x  <  M  . 

Singular  Value  Decomposition  (SVD) 

The  singular  value  decomposition  (SVD)  is  relevant  to 
principal  component  analysis  in  several  respects.  The  basic 
concept  is  to  represent  a  given  data  matrix  X  of  size  NxK . 
SVD  is  then  applied  to  this  matrix  to  obtain  U,S,  and  V 
matrices.  This  compression  operation  is  expressed  in  the 
following  equations.  Singular  Value  Decomposition  is  given  by 

X  =  USVT  (18) 

where  the  dimensions  of  X,t7,5,and  V  are  NxK ,  NxK , 
K  x  K  ,  and  K  x  K ,  respectively. 

Reconstructed  data  is  computed  by 

X=USVT  (19) 

where  L<K  and  the  dimensions  of  X,U,S,  and  V  are 
NxK,  NxL,LxL,and  K  x  L ,  respectively. 

The  columns  of  U  are  called  the  left  singular  vectors.  The  rows 
of  VT  contain  the  elements  of  the  right  singular  vectors.  The 
elements  of  S  are  only  nonzero  on  the  diagonal,  and  are  called 
the  singular  values.  For  example,  if  rank(X)  =  L ,  then 

diag(S)=(sl,s2,...,sL )  (20) 

where  .s',  >  ,v  s L  >  0 . 

Note  that  for  a  square  and  symmetric  matrix,  the  singular  value 
decomposition  is  equivalent  to  diagonalization,  or  solution  of 
the  eigenvalue  problem.  The  SVD-based  compression  method 
is  popular  to  compress  large  data  matrices. 

The  fundamental  concept  of  the  SVD-based  compression 
scheme  is  to  use  a  smaller  number  of  dimensions  to 
approximate  the  original  matrix.  The  SVD  does  not  provide  a 
computationally  efficient  method  of  compression.  However,  the 
importance  of  using  the  SVD  for  principal  component  analysis 
is  that  SVD  provides  the  standardized  versions  of  principal 


component  scores.  Component  scores  are  useful  for 
correspondence  analysis. 

Rate  Controlled  Eigen-Based  Coding  for  EHIDT 

The  classes  of  admissible  transformations  in  SVD  are 
different  for  different  types  of  data.  Admissible  transformations 
should  be  found  to  minimize  the  appropriate  loss  function.  It  is 
common  to  calculate  the  principal  components  using  a 
covariance  matrix.  The  reason  is  that  eigenvectors  of  a 
covariance  matrix  may  provide  admissible  transformations. 
There  are  other  ways  of  computing  principal  components.  In 
one  method,  eigenvectors  of  a  correlation  matrix  are  used  to 
compute  principal  components  with  standardized  variables. 
However,  the  principal  component  based  EHTDT  coding  is 
simply  implemented.  Even  though  the  loss  function  is  not 
minimized  in  the  coding  scheme,  the  scheme  does  yield  a 
reduction  in  the  computational  complexity  and 
misclassification  rate. 


Figure  5.  RATE  CONTROLLED  EIGEN-BASED  CODING  REGIONS. 

Suppose  that  are  Nxlnonzero  mean 

observation  vectors 

f(x)  =  six)  (21) 

where  x  =  {2,3,. ..,T, N  —  T,..., N}  and  s(x)  =  t)(x)  —  rj(x— 1) 

T  is  a  threshold  determined  from  the  c(x)  plot  as  indicated  in 
Figure  5.  For  x  in  between  T  and  N-T,  s(x)  can  be  taken  as 
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zero  for  all  practical  purposes;  this  gives  f(x )  equal  to  zero  in 
this  range  of  x  between  T  and  N-T. 

The  empirical  covariance  matrix  of  the  nonzero  mean 
observation  vectors  is  defined  for  computational  simplicity. 

The  NxN empirical  covariance  matrix  R  is  defined  by 


|  M 

r‘mVj: 


(22) 


where  the  correlation  matrix  R  is  almost  always  nonsingular 
and  symmetric. 

Instead  of  using  the  empirical  covariance  matrix  S  or  ordinary 
empirical  correlation  matrix,  the  empirical  covariance  matrix 
of  the  nonzero  mean  observation  vectors  is  defined  for 
computational  simplicity.  The  empirical  covariance  matrix  S 
provides  that  the  first  principal  component  corresponds  to  a  line 
that  passes  through  the  mean,  and  minimizes  the  mean  square 
error  of  approximating  the  data.  On  the  other  hand,  the 
empirical  covariance  matrix  R  of  the  nonzero  mean 
observation  vectors  R  provides  an  effective  cluster  separation 
for  each  streaming  network  traffic  datasets.  For  anomaly 
detection,  the  empirical  covariance  matrix  R  minimizes 
computational  complexity  and  maximize  detection  rate. 

The  unique  set  of  N  orthonormal  eigenvectors  is  computed 
with  the  correlation  matrix  R  and  the  corresponding  L 
eigenvectors  to  form  an  N  x  L  eigenvector  matrix  QL  .  The  first 
L  eigenvectors  are  Qr  =[ql,q2,...,qL\  and  their  associated 
eigenvalues  are  \  >  X,  > ...  >  AL  . 


The  simple  coding  pairs  are  given  by 

R  =  Q',x 

(23) 

and 

x=Qlp 

(24) 

where  IV  >  Land  X  =  [f1,f2,...,fM], 

Note  that  the  columns  of  LxM  matrix  P  =  [pl,p2,.. 

-  Pm  1  are 

the  L  -dimensional  principal  component  vectors  pl,p2,...,pM 
for  the  N  -dimensional  vectors  and  the  columns  of 

NxM  matrix  X  =  are  the  reconstructed  N  - 

dimensional  vectors  fl,f2,...,  fM  . 

This  coding  scheme  is  simpler  than  the  Karhunen-Ldeve 
transform  and  other  principal  component  analysis  techniques. 


CONCLUSION 

Power-law  distributions  are  widely  used  for  estimating 
packet  interval  time  as  well  as  in  other  networking  applications 
and  s  -  contaminated  (Gaussian-mixture)  distributions  are 
useful  to  detect  anomaly  network  traffics.  The  estimation  of 
important  tail  characteristics  is  directly  linked  to  the 


interpretation  of  the  underlying  network  traffic.  Since  there  is  a 
limitation  to  estimate  parameters  of  various  heavy  tailed 
distributions  because  of  mixture  distribution  characteristics  of 
heavy  tailed  network  traffic,  an  efficient  and  practical 
parameter  estimation  technique  has  not  been  derived.  For 
statistical  anomaly  detection,  heavy-tailed  probability 
distributions  of  network  traffic  data  have  been  proposed  to 
mitigate  the  limitation  of  parameter  estimation.  In  this  work, 
the  EHTDT  transform  converts  such  a  heavy  tailed  distribution 
into  a  transformed  probability  distribution  more  amenable  for 
lossy  compression  of  network  traffic  data. 

Experimental  results  indicate  that  a  compact 
characterization  of  heavy  tailed  network  traffic  data  can  be 
achieved  by  the  EHTDT  transform  and  the  Rate  Controlled 
Eigen-Based  Coding  approaches.  Efficient  intrusion  detection 
data  modeling  can  be  developed  by  the  proposed  approaches 
using  various  network  traffic  features. 

REFERENCES 

[1]  Baiardi,  F.,  Telmon,  C.,  and  Sgandurra,  D.,  2009, 

“Modeling  and  managing  risk  in  billing  infrastructures”.  In 
Critical  Infrastructure  Protection  III,  IFIP  Advances  in 
Information  and  Communication  Technology,  Vol.  311,  C. 
Palmer  and  S.  Shenoi,  eds.,  Boston:  Springer,  pp.  51-64. 

[2]  Crovella,  M.,  2001,  “Performace  Evaluation  with  Heavy 
Tailed  Distributions”.  In  Lecture  Notes  in  Computer 
Science;  Vol.  2221.  London:  Springer- Verlag,  pp.  1-10. 

[3]  Dasgupta,  A.  Hopcroft,  J.,  Kleinberg  J.,  and  Sandl,  M., 
2005,  “On  learning  mixtures  of  heavy-tailed  distributions”. 
In  Proceedings  of  the  46th  Annual  IEEE  Symposium  on 
Foundations  of  Computer  Science,  pp.  491-500. 

[4]  Dainotti,  A.,  Pescape,  A.,  and  Ventre,  G.,  2006,  “A  packet- 
level  characterization  of  network  traffic”.  In  Proceedings 
of  the  11th  IEEE  International  Workshop  on  Computer- 
Aided  Modeling,  Analysis  and  Design  of  Communication 
Links  and  Networks. 

[5]  Elleithy,  K.,  and  Al-Suwaiyan,  A.,  2001,  “Network  traffic 
characterization  for  high-speed  networks  supporting 
multimedia”.  In  IEEE  Proceedings  of  the  34th  Annual 
Simulation  Symposium,  pp.  200. 

[6]  Kornexl,  S.  ,  Paxson,  V.,  Dreger,  H.,  Feldmann,  A.,  and 
Sommer,  R.,  2005,  “Building  a  time  machine  for  efficient 
recording  and  retrieval  of  hign-volume  network  traffic”.  In 
Proceedings  of  the  5th  ACM  SIGCOMM  Conference  on 
Internet  Measurement,  pp  267-272. 

[7]  Maillartl,  T.,  and  Sornettel,  D.,  2009,  “Heavy  tailed  dis¬ 
tribution  of  cyber -risks”.  URL 

http://arxiv.org/PS_cache/arxiv/pdf/0803/0803.2256v2.pdf. 

[8]  Rezaul,  K.,  and  Grout,  V.,  2009,  “An  approach  for 
characterising  heavy-tailed  internet  traffic  based  on  EDF 
statistics,"  in  Intelligent  Engineering  Systems  and 
Computational  Cybernetics.  Netherlands:  Springer,  pp. 
173-184. 

[9]  Vempala,  S.,  and  Wang,  G.,  2004,  “A  spectral  algorithm 
for  learning  mixture  models”.  Journal  of  Computer  and 
System  Sciences  archive,  vol.  68,  Issue  4,  Special  issue  on 
FOCS,  pp.  841-860. 

[10]  Meza,  J.  ,  Campbell,  S.,  and  Bailey,  D.,  2009,  “Mathe¬ 
matical  and  Statistical  Opportunities  in  Cyber  Security”. 
URL 

http://arxiv.org/PS_cache/arxiv/pdf/0904/0904.1616vl.pdf. 


“This  material  is  declared  a  work  of  the  U.S.  Government  and  is  not  subject  to  copyright  protection  in  the  United  States.  Approved  for  public  release;  distribution  is  unlimited.” 


6 


